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A  FORMALIZATION  OF  FLOATING  POINT  NUMERIC  BASE  CONVERSION 


I.  INTRODUCTION  AND  SUMMARY 


The  necessity  of  base  conversion  of  numeric  data  during  some  stages  of  computation  on  a  digital  computer  is 
a  de  facto  component  of  practical  numeric  computation  that  must  be  recognized  in  any  complete  analysis  of 
digital  computation.  On  the  hardware  level  the  trend  towards  establishment  of  computer  networks  with  possibly 
differently  based  machines  and,  on  the  software  level,  the  mixed-base  flexibility  inherent  in  the  PL/I  language 
specifications,  both  suggest  tiiat  internal  data  of  certain  jobs  may  be  necessarily  subjected  to  multiple  conversions 
before  job  termination.  Thus,  references  to  a  purportedly  constant  floating  point  datum  occurring  at  different 
points  during  program  execution  might  encounter  altered  datum  values.  Both  hardware  and  software  designers 
must  recognize  this  problem,  and  each  can  benefit  from  the  fundamental  principles  obtained  by  considering  base 
conversion  as  a  mathematical  transformation.  In  this  report  we  shall  follow  the  notation  of  our  previous 
work1  4 .  integrating  the  mainstream  of  results  from  those  articles  with  a  general  formal  development  of 
conversion,  and  providing  new  results  particularly  in  the  area  of  multiple  conversions. 

A  fundamental  analysis  of  base  conversion  is  concerned  first  with  determination  of  the  theoretical  limitations 
inhcicnt  in  any  implementation  of  base  conversion.  The  actual  algorithmic  mechanics  of  any  theoretically 
realizable  conversion  procedure  is  then  a  secondary  (albeit  non  trivial5  Jproblem  which  will  not  concern  us  in  this 
article.  The  formalization  we  introduce  provides  the  vehicle  for  studying  the  properties  and  recognizing  the 
inherent  anomalies  of  the  conversion  process,  which  must  necessarily  then  guide  the  performance  specifications 
for  floating  point  representations  and  base  conversions  at  both  hardware  and  software  levels. 

Converting  integer  and  fixed  point  data  to  an  “equivalent”  differently  based  number  system  is  generally 
achieved  by  utilizing  essentially  logg  (3  times  as  many  digits  in  the  new  base  6  as  were  present  for  representing 
numbers  in  the  old  base  0  system.  This  simplified  notion  of  equivalence  does  not  extend  to  the  conversion  of 
floating  point  systems.  Actually,  conversion  between  floating  point  number  systems  introduces  subtle  difficulties 
peculiar  to  the  structure  of  these  systems  so  that  no  such  convenient  formula  for  equating  the  “numbers  of 
significant  digits”  is  even  meaningful.  Thus,  our  formalization  of  floating  point  base  conversion  is  preceeded  in 
section  2  by  a  careful  analysis  of  floating  point  number  systems. 

Following  our  previous  work1-4,  a  system  of  floating  point  numbers  of  n  significant  digits  to  the  base  0  i<*. 
characterized  as  a  significance  space,  Sjlj,  and  in  theorem  1  the  number  of  elements  in  relative  to  the  number 

of  elements  in  S£  within  a  specified  interval  is  shown  to  converge  to  ((5  —  I )6n-  '/((0-1)  0n~'))  logg/3  as  the 
interval  grows  to  include  the  whole  real  line.  This  relative  density  of  “total  membership"  of  two  significance 
spaces  provides  only  a  gross  comparison  of  tile  two  number  systems,  for  actually  there  is  considerable  local 
variation  in  the  relative  density.  The  gap  function' ,  I^n  (x),  is  defined  as  the  relative  difference  between  nearest 


neighbors  of  Sjj  ;ii  x.  A  comparison  of  I  ho  graphs  of  Fjj  and  I’jV  provides  more  insight  into  t  lie  comparability  ol 
two  differently  based  floating  point  systems  than  any  simplified  "equivalent  digit"  formula. 

Having  eliaiaeleri/ed  floating  point  number  systems  and  the  gap  function,  the  conversion  of  the  leal  numbers 
into  Sjj  both  In  louuduig  and  by  trum„tion  procerhires  aie  then  formali/ed  in  section  3.  The  order  preserving 
properties  of  these  conversion  mappings  are  detailed,  and  the  Base  Conversion  Theorem2  is  stated,  which  gives 

trie  nCiesSai  e  .lull  sullkluii  Liiiiitilnmstiii  a  uni  vi  i  sio.i  (nut  pui£,  i  i  ohi  S^j  iO  fie  f  f)  oiic-tu-Ofe  uiid  Oil'll,. 

The  impoitant  piobleins  associated  with  multiple  conversions  of  a  datum  are  analysed  in  section  4.  The 

_•  in-  '  i  i||l _ _  ill  inr  ,  || i nv, l  r  * _ V  'l  ■ 

'■  nUw  sj'  TiynuiJ  >  <inrl|pg  m4-*w  nnunw^m  jrnwiii  li.iin  m  nrmufl  §  Liiiwi/)iilliiJ  tin*iwi4i.  m 
associated  invariant  points  (i.e.,  the  points  mapped  into  themselves)  of  such  a  compound  conversion  mapping  are 
analysed.  For  a  compound  truncation  conversion  it  is  shown  that  the  only  invariant  points  of  the  mapping  are  the 

i  UilillllS  lOril.llO  I  to  .111  III  itlL.  S.illlf  liaiH.1  s)  .ILL  .1  llllllllL-d.  Til  US.  L-OILSldc  I  llblc  j1|||.L;|Tl1IISL.  iiIUSI  Ilk  LlllUL.Ilkd  IO 

the  inteisection  of  significance  spaces,  and  in  theorems  10  and  II  these  intersections  are  shown  to  exhibit  a 
Miu  an  vm  cjiBwwn*»ii(bpiu’  >  tuv  r«.f  tywiVaTii/  ^luT-Ptci  «mii  -  wmunmwfffc 

bases  will  always  jointly  contain  a  common  significance  space.  For  example  the  6-digit  hexadecimal  numbers  and 
the  h-digit  octal  numbers  both  contain  all  2 1  -hit  binary  numbers.  On  the  other  hand  the  members  common  to 
significance  spaces  with  incommensurable  bases  (e.g.  binary  and  decimal)  will  be  finite  in  number  and.  for  cases 
ol  computational  interest,  these  members  will  typically  all  fall  in  an  interval  much  smaller  than  the  interval  range 
provided  by  cunent  exponent  ranges  on  digital  computers. 

As  a  consequence  ol  the  limited  membership  of  points  common  to  two  incommensurable  significance  spaces, 
the  multiple  back-and-lorth  conversion  ol  a  "constant  datum"  between  two  incommensurable  significance  spaces 
by  truncation  conversion  can  accumulate  error  so  as  to  invalidate  even  the  leading  digit  of  the  value  of  this 
constant  datum  Practically,  the  process  ol  updating  a  B.C  I).  tape  on  a  binary  machine  might  well  subject 
stored  data  which  is  never  updated  to  multiple  binary -decimal  conversions,  so  B.C.D  tape  updating  is  very 
sensitive  to  such  anomalies  of  compound  conversions.  Fortunately,  we  can  show  that  under  very  general 
conditions  iterated  rounding  conversion  of  a  datum  between  two  significance  spaces  quickly  generates  a  stable 
pair  ol  values  each  of  which  is  a  reasonable  approximation  of  th<*  if'irinJ  iln'mn  C  tulri  dte  rtcnngKT  conditions 
given  m  the  In-and-Out  ( onversion  1  heorenr'  rounding  conversion  tlnough  an  intermediate  significance  space  can 
be  guaranteed  to  regenerate  the  initial  datum  ol  the  original  significance  space. 

In  the  puweii ip  df  m*j  au  imumiiiuUAHiAtfr  t***-  4U*  t/f  uvcuiii  jia 

error  in  a  datum  is  shown  to  exist  even  under  rounding  conversion.  Our  final  result  resolves  the  problem  of 
controlling  the  overall  growth  ol  accumulated  conversion  error  within  a  mixed  base  computational  environment 
by  a  process  which  standardizes  a  datum’s  value  in  each  of  the  significance  spaces  involved. 


II.  FLOATINGPOINT  NUMBER  SYSTEMS 


\  loiiiuli/mh.i,  ol  floating  point  mimhei  systems  must  sK.r.  with  a  cliaracieiizat.on  of  the  set  „f  float, ng 

ponit  mimlvis.  ptelerahK  duo, ced  Mont  the  cumbersome  digit  sequence  lepiesenlaltonal  notation,  in  previous 
at  I  id--  1  4  . . .  .  ' .  ■  - 


'"■‘I  has  he eii  leinied  a  significance  space. 


'*K  Mll0-LV,s  ^  iIr'  j,kI  n  >  I.  called  the  Ki^nijuamr  (or  /mrisimi),  let  the 

significance  spmr.  S|j.  he  the  following  set  ol  ,eal  mnnbers: 


•''|j  =  {llH1  -  M'  h'l  some  integers  k,  j  where  |k|<fi"} 


l  o,  dantv  we  shall  tilih/e  the  Cheek  letteis  0  and  b  to  denote  bases;  the  English  letters  a.b.c  and  d  will 
denote  elements  ol  a  significance  space.  ,.e.  the  so  called  "floating  point"  mimbeiv  the  letters  i.,.k.l.,„.n.p  and  c, 
will  denote  nilegeis:  and  \.\  and  z  will  denote  arbitral).  leal  numbeis. 


or  an  e 


element  h  =  kfi'  f  S.'j  the  actual  floating  point  repieseiitatimi  ol  h  can  be  visualized  as  having  the  fixed 
point  integer  poition  k  lepvesen.ed  In  a  sign  and  n  or  less  digits  to  the  assumed  base  0.  with  the  exponent 
portion  then  repiesented  b>  the  mtegei  |.  The  floating  point  lepiesentation  just  desetibed  will  not  m  general  be 
utmitie.  with  digit  set, lienee  leah/a.ions  ol  b  conespomlmg  to  both  ••normalized"  and  "unnormalized"  forms 
poss'b'e  some  cases.  Cons, de, alums  .elated  to  the  non  unique  de.eimma.ion  ol  k.j  in  b  =  k0>  will  be  treated 
wheie  necessary.  however,  mu  mam  concern  ,s  with  membership  of  b  the  set  of  real  numbers  SS  lo,  which 

,he  ,,,,m  "*«**'”*  11  mclcVyn''  Nl,K’  »”«  s,i:n i f icance  space  Sfi  differs  from  an  actual  tloating  point 

iiumhei  s>  stem  that  theie  ,s  no  bound  on  the  exponent  poi.ion  of  the  members  b  =  k0>  c  S'j.  Thus  S2  is 

actualU  an  inlmitc  set  Since  we  shall  not  couceni  ourselves  will,  undeitlow  and  overflow  problems,  the 
sign" icance  space  S”  is  a  peilectb  acceptable  model  of  a  floating  ponit  number  system  for  our  purposes. 

If  is  easy  ,o  visualize  the  change  the  set  Sjj  caused  by  vaiying  the  significance,  since  increasing  n  to  „+| 
mamtams  all  niembeis  of  and  adds  0-1  new  members  umloimlv  spaced  between  every  neighboring  pa.r  of 

members  ol  Sjj.  The  dependence  of  the  membership  of  S£  m,  ilie  base  0  is  tar  more  subtle.  In  practice  it  has 
heen  convenient  to  identify  a  non  decimal  flouting  point  number  system  with  the  “appropriate"  decimal  based 

system,  however  we  now  show  that  such  a  purported  equivalence  glosses  over  certain  inherrent  anomalies 
between  dilleienlly  based  11  laling  point  number  systems. 

A  gross  comparison  ol  two  differently  based  significance  spaces  can  be  obtained  by  determining  the  relative 
number  ol  members  of  each  space  over  a  comparable  :ange.  Eor  example  the  .^significant)  bit  binary  numbers 
between  unity  and  one  thousand  are  I..  1.01,  =  1.25.  1.10,  =  1.5,  |.j  |,  =  |.75.  H).(),  =  2..  10.1,  =  2.5.  1 1.0 

=  3.,  III,  =  3.5.  100,  =  4.  101 2  =  5 . 1 100000000,  =  7bX.  1 1 10000000,  =  XOb.  and  the  I  (significant) 

digit  decimal  numbers  over  the  same  range  ate  1 .  2.  3,  4,  5.  7,  X.  0.  |().  20.  30,  40.  50.  00,  70.  X0.  00.  100. 
200.  300.  400.  500,  hOO.  700.  X00.  000.  1000.  In  figure  1  these  members  are  indicated  by  tickmarks  on  the  real 
lute  plotted  on  a  logarithmic  scale  so  that  the  log  periodic  imtiue  ol  the  spacing  between  flouting  point  numbers 
is  evident.  The  ratio  of  the  numbers  of  members  of  these  systems  over  this  interval  is  40/28=1 .43.  Now  this  ratio 
ol  membership  density  will  vary  with  the  choice  of  interval,  howevei,  a  reasonable  overall  comparison  of  any  two 
floating  point  systems  can  be  calculated  by  determining  the  limit  of  such  a  ratio  as  the  interval  grows  to  include 
the  whole  real  line. 
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Theorem  I :  Lei  and  S™  he  an\  two  significance  spaces.  Then  letting  |s|  denote  the  niinihei  ol members  of 
the  . ’  S. 


|{dl  JcSj-.-ji  <  Idl  <  M>| 

hm  - j - - 

M-x  K»»l  hcS^.Tg  <lh|  <  M}| 


Ih  lift"’  1 
-  logAd 

(d  I  >  3n  ~  1 
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Proof:  let  |_\J  denote  the  greatest  mtegci  in  x  Then  the  closed  interval  |j^.  NlJ  may  he  divided  into  2 
jlog^NlJ  disjoint  half-open  half-dosed  intervals  of  the  form  £dJ-  dr  ')  and  two  sith  inteivals  of  such  intervals. 

[d1.  ')  contains  (d  lid"  1  distinct  memheis  ol  S^.  Noting  that  bcSjj  *»  hcSjj  we  have  for 


I’acli  interval 
M  >  I . 


Uhl  brSj}.  u  < 


M}|  =  :|{b|beS^,  +  <  b  <  Ml| 

=  2( 2  [ingjjNlJ  +  cMd  Hd"  '  where  0  *£  c  <  2 


=  2(2  log^M  +  c  »  (d  Hd"  '  where  |c'|<2 
the  latter  resulting  from  removal  ol  the  greatest  integer  brackets,  finally 

Bl„  Ita|  J*sg.  la  <MI<  M}| 

M-*  IH  I’fSJ.  jf  ClhIC  Mjl 


=  lim 
M  -♦*> 


2(2logftM +c,)(6  1 1 &ni  1 

2(2log<3M+6,Md  Dd"  1 


where  |c,  I  .  k2l  <  2 


(6  I )  6m  ~  1 

(d  Dd"  1 


logjd 


s. 

wc 

are 


llio  l.'lkl.i.o  mi  conversion  III  ore  is  an  nil  tpimed  niilii.ii  II,  al  a  decimal  digll  is  eqmvalersl  lo  log  10-3.32.  bn 

7'*"  d'C""a'  1x1  ""lv  Il'“  Ilian  a  1 0  bil  binaiy  system,  However,  if , 

al  y  -VI  »|i"b„l, calls-  dciinu'  Ihc  liniiling  lalm  give, in  c,|„ali„n  ||  IS-'  /S‘°l  -  .529  .  .  ,s„  ,|,al  ,he„  a,c 

ae, nails  only  33'  as  many  ical  number,  representable  will,  3  sigiiirican,  decimal  dig!,,  as  lliere  a,e  real  „ ambers 
representable  will,  1(1  "gnu, can,  bus  I  tirllteniMire  rlre  ,al„,  of  ,|,e  rnnrrbe,  „r  members  of  S'  s'  „,e,  lire 
lance  -l,"»n  n,  figure  I  . . .  . . .  Bj/Sj.,  -  ,.47„  ....  alresling  „,e  fac,  dm,  ,,77  about 

. . .  b"  hmi,,v  I  significain  dig,,  decimal  numbers,  and  . .  a  clear 

ciiiilradicliint  Hie  d,g„  -  3  32  blls  rnle.  Tins  annmally  prevails  even  will,  more  digils.  since  f„,  large  imegral 
and  ii  chosen  such  tliai  l()m  2"  approaches  unity. 


li  in 

in-**’ 

i  om  :n- 


ism  s"i  = 

’  i  o 


9  i 

r  [oho2 


=  .54 1 8 


,  T"°  . . .  ^  . . . drive  grounds.  T„e  relarlnn  "digi,  -  log,  ,0  bils"  comes 

iZ, e' r,,“'d  .TT  m°tm  mm  "**' ,I,C  a'c  dislinguishable. 

.  .  ‘  ',P'’  '‘J  '  "  '  . . *'"**'  ”J  l'x'"1  I*"'1"  "umber  sysicns.  Iloaling  point  syslems  have 

re  ", dan,  repiesen, alums  some  numbers,  generally  resolved  by  norm. Ida, ion.  so  lira,  rhe  f  paiiems  of 

T  ‘ . 7  1  ^  yMd  Or  each  j.  Fur, he, more,  Hie 

;  : . ; . """J  m,,"bm  p„wc,  „»  base s^ 

7  “J  b""'  *«  “mdiUmd  are  rellceled  u,  ,1m  final  form  „f  equation  ,  | ,. 

„f  ;  ;7’"e  . . .  . . *  *  *“■“"»  -«•*<*•»  m  formula1'  is  equale  rhe  rigid  Hand  side 

f  (I »  to  nuns  and  solve  lor  m  in  terms  of  n,  p  and  5  Thus  if  Sm  is  i  k  , 

IS*"’  ^|>  I ,  then  from  ( I 5  ^  '°  be  dense  ,han  Sp  "hen 


Corollary  l.l. 


,s  m,,re  (|ense  than  S[}  if  and  only  if 


m  >  n  log^/3  +  log 


6<P  I) 

V  J( ~)  hk,^ll,8^ 


(2) 


Attributing  meaning  to  a  non-integral  number  of  digits. 


one  may  propose  that 


m  =  n  log60  +  log6 1  6(0  I )  /  /3  (6  I )]- logging^ 


(3) 


is  the  “equivalent  digit  formula  for  floating  point  number  systems”, 
would  have 


For  binary-decimal  conversion  we  then 


#  bits  -  3.32  .  .  .X(#  decimal  digits)  -  .884 


(4) 
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Tlie  variability  of  spacing  of  floating  point  numbers  of  a  given  is  such  that  the  simplified  formula  (3)  does 
not  really  provide  an  adequate  comparison  of  differently  based  floating  point  systems  and  more  attention  to 
local  magnitude  dependent  variability  must  be  considered.  In  studying  the  internal  structure  of  Sjj  note  that 
every  one  of  its  non-zero  members  will  have  both  a  next  largest  and  next  smallest  neighbor  in  Sjj. 

Definition:  The  successor,  b of  beS^,  b^O,  is  given  by 


b'=min{d|d>b,deSjj}  (5) 

and  since  distinct  members  of  have  distinct  successors, b  may  then  be  referredto  as  the  (unique)  predecessor  of 
bin  Sjj. 

Now  the  absolute  difference,  b'-b,  will  grow  with  b  in  Sjj,  however,  the  relative  difference  is  bounded.1 
Definition:  The^ap,  Pjj  (x),  in  Sjj  at  x  is  given  by 


min{b|b  >  x,  beS^j}  -  max{b|b  <  x,  beSjj} 
x 

Specifically  then  F^(a)  =  (a  -~a)/a  for  0  <  aeS^.  From  the  structure  of  floating  point  number  systems  it  is 
evident  that  Pjj(0x)  =  Pjj(x),  so  will  experience  a  log  periodic  behavior.  For  1  <  x  <  0,  the  numerator  of  (6) 
will  have  the  constant  value  0l_n,  so  that  on  a  log-log  scale  the  gap  function  appears  as  a  saw  tooth  function. 
In  figure  2,  sections  of  the  gap  functions  r*Q  ai  d  r*6  arc  illustrated.  Note  that  the  variation  in  the  magnitude 
of  the  gap  function  is  greater  for  larger  bases. 


for  x  >  0 

(6) 

for  x  <  0. 


i 
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Froni  theorem  I  we  eau  ealeulate  that  IS^6/S^0|  -  5.66  ....  so  that  overall  there  are  more  than  five  times  as 
main  4  significant  digit  hexadecimal  numbers  as  4  significant  digit  decimal  numbers.  Yet  from  the  gap  functions 
in  figure  2  it  is  apparent  that  over  the  interval  (.0625.. 1 000)  there  arc  more  4  significant  digit  decimal  numbers 
that  4  significant  digit  hexadecimal  numbers  since  r40(x)  <  P46(x)over  that  interval. 

From  the  log  periodic  behavior  of  the  following  bounds  arc  immediate. 

Theorem  2.  The  function  Pjlj  attains  both  a  minimum  and  a  maximum  value  over  the  non-zero  members  of  So 
given  by  ** 


min  (rjj  (b)  IbeSj!].  b  *  o)  =  l/(0n-l) 
max  {Pjj  (b)  IbeS^.  b  *  0)  =  1/0"-' 

and  over  the  non-zero  reals  the  bounds  on  P2(x)  are  given  by 

inf  { rj!j  (x)  |  x  ^  0}  =  1/0" 
max  (l'jj  (x)  |  x  *0}  =  1/0"“  1 


(7) 


Thus  the  gap  function  presents  a  more  complete  picture  of  the  structure  of  a  floating  point  number  system 
than  any  “equivalent  digit"  notion,  and  simplified  formulas  such  as  (3)  and  (4)  must  be  used  with  extreme 
caution  in  any  comparison  of  differently  based  floating  point  number  systems. 


-fi- 


111  CONVERSION  MAPPINGS 


A  conversion  procedure  determines  a  specific  value  in  Sjj  foi  each  real  number  x.  Thus,  a  conversion  process 
may  be  characterized  as  either  a  function  or  a  mapping.  Formalization  of  conversion  as  a  mapping  appears 
preferable  since  it  is  often  useful  to  invert  the  question  and  refer  to  the  set  of  points  which  map  into  a  given 
element  of  Sjj,  and  this  notion  is  then  readily  available  by  considering  the  inverse  mapping.  Certainly  any 
conversion  mapping  of  the  reals  to  Sjj  should  be  the  identity  on  Sjj  and  in  addition  it  is  desirable  for  this 
mapping  to  satisfy  certain  order  preserving  properties. 

Definition:  A  mapping.  M,  of  a  set  R  'of  real  numbers  into  the  reals  is 


1 )  Weakly  order  preserving  (iso tone6,  monotone j  on  R  ' 
if  x  <  y  =>  M(x)  <  M(y)  for  all  x,  yeR ' 

2)  Strongly  order  presening  on  R  \\ ■  x  <  y  =*  M(x)  <  M(y) 
for  all  x,  yeR  ' 


Furthermore  the  mapping  is  said  to  be  weakly  (strongly)  order  preserving  if  it  is  weakly  (strongly)  order 
preserving  on  the  reals. 


No  conversion  mapping  of  the  reals  into  can  be  strongly  order  preserving,  however,  we  can  expect  that  a 
conversion  process  should  at  least  be  weakly  order  preserving  in  addition  to  being  the  identity  on  S^.  These 
latter  two  conditions  do  assure  that  the  inverse  image  of  any  beSjj  under  a  conversion  mapping  is  an  interval 
containing  b.  Formally  we  shall  limit  our  discussion  to  the  rounding  and  truncation  (sometimes  called  chopping) 
conversion  procedures  usually  encountered  in  computerized  numeric  processing. 

Definilion:  The  truncation  conversion  mapping,  Tjj ,  and  the  rounding  conversion  mapping,  RS.  of  the  real 
numbers  into  Sp  are  defined  for  aii  integers  2,  n  >  I  as  follows: 


Truncation  Conversion 


[  max  {b|b  <  x,  beS^  }  for  x  3*  0 
{ min  (b|b  >  x,  beS^j  }  for  x  <  0 


(9) 


-9- 


Rouikling  Conversion 


min  {b|  >  x.beS^}  for  x  >  0 

1*0  (x)  =  <jmin  { b|  x.htSjj}  for  x  <  0 

0  for  x  =  0 


(10) 


The  effects  of  these  conversion  mappings  in  the  neighborhood  of  a  power  of  the  base  are  shown  in  figure  3. 
Note  that  the  distinctions  in  the  definitions  of  the  mappings  of  positive  and  negative  values  are  required  to 
achieve  the  desired  sign  contplementery  relations: 

T0  (  x)  =  Tp  (x)  ,  (  x)  =  R£  (x).  (11) 


) 


Reals 


P,(l+3p'-n) 


P'(l+2P,~") 


(a)  (b) 

figure  3:  Conversion  of  the  real  numbers  lo  the  n(significant)  digit  base  P  numbers  in  the 
neighborhood  of  a  power  of  the  base  by  (a)  truncation  conversion,  Tjj,  and  (b)  rounding 

..n 

conversion.  Up ■ 
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Noie  R'j(-^v-)  =  h  lor  all  positive  bcS,",  so  t hat  we  have  not  imposed  the  additional  symmetric  rounding 
condition  lor  mid  points  dependent  on  the  “parity"  of  h  that  some  researchers  prefer.  Although  this  refinement 
could  he  added  without  materially  affecting  our  results,  we  prefer  the  definition  above. 

It  is  evident  that  Tjj  and  both  are  weakly  order  preserving  mappings  which  are  identities  on  as  desired. 
In  practical  numeric  computation  we  often  need  to  convert  data  already  expressed  in  floating  point  form  to  a 

Jltiereillly  Oased  tToaong  poim  II  IIII.  11110  >H  aic  imc.c.Mcd  in  ific  (i.i.pL.  in.  ul  die  icStuCicJ  mappings 

-  Sjj  and  TjjlSjJ’  -  S/j 

l-'rom  consideration  ot  the  algorithmic  mechanics  of  conversion  there  is  evidently  a  considerable  difference 

frc'ffS  c  cfl  "fti  Vjii  Ti t .t"v. c/Ttia1!  C  Tirvx i  ti  trnvnv  ^ . nT.  c  iivci.cn  "fence  j  'cuif  nriaiiuns1iii  f civ 

bases  do  affect  the  properties  of  restricted  conversion  mappings  there  is  need  to  characterize  this  important 
"commensurabilily"  relationship  between  bases. 

Definition :  Let  0  S*  2  be  a  loot  free  integer,  i.e.  0  has  no  integral  i"1  root  lor  any  i.  Then  the  numbers  0,  02 , 
0'\  .  .  .  form  a  commensurable  family  of  bases  termed  the  0-family  of  bases.  Two  or  more  bases  belonging  to 
the  same  commensurable  famih  ol  bases  are  commensurable  bases.  Two  bases  which  do  not  belong  to  a  common 
commeiisuiable  family  are  termed  incommensurable  bases.  Lurtliermote,  two  or  more  significance  spaces  will  be 
termed  commensurable  when  their  bases  are  commensurable. 


Thus  two,  eight  and  sixteen  are  commensurable  bases,  whereas  base  ten  is  incommensurable  with  any  member 
ol  the  binary  family  of  bases.  The  root  free  condition  on  0  in  this  definition  simply  assures  that  each  base  is  in 
precisely  one  family. 

.A  ttselul  equivalent  characteii/ation  of  coinmeiisuiable  and  incommensurable  bases  avoiding  explicit  mention 
ol  the  icspective  families  to  which  the  bases  belong  is  provided  in  the  following. 


Lemma  3.  The  bases  0  and  f>  arc  commensurable  it  and  only  il  0‘  =  6*  for  some  non-zero  integers  i,  j 


(  orollary  .1.1 :  I  he  bases  0  and  6  are  commensurable  if  and  only  if  logfi  0  is  rational. 

Thus,  the  gap  I  unctions  ljjn  and  l'Am  plotted  on  a  log-log  scale  will  share  a  common  period  when  0  and  6 
art  commensurable,  as  in  figure  4.  and  otherwise  will  not  (see  figure  2). 


I- inure  4:  Sections  ol  the  nan  functions  1'*?,  ,  for  the  6  (significant)  digit  hexadecimal  numbers  and  T^.  for  the 

I  I)  O 

8  (significant)  digit  octal  nil n:  bo-rs.  Ilex ad ecimal  and  octal  are  commensurable  bases  and  share  a  common 
period  of  2*^  on  Hie  log  scale.  Note  that  the  6  digit  hexadecimal  numbers  and  the  H  digit  octal  numbers  are 
not  equivalent  floating  point  systems. 


-II- 


II  «  '"'1,1'  V", 'In'll  Ill'll  II, I'  I'l'iimiii...  by  I, .undine  m.iiiim..,,  r,„n,  .1  fixed  poinl  nun, be,  lysicn,  will,  „ 

d  my  .Hein  in  Hie  lifln  „f  II, e  rudix  poll . . . .  fixed  . . .  wiib  S  „y  diei,s  „,e  „f 

I  n  i.„lix  wdl  be  une-liiDiie  if  .11  „«  „  Inejd  end  . .  if  m  <  „  1,^0.  Time  1,  e„„»e,si„„  beiween  fixed 

l’"m'  SyS""“  hc  Cilh"  '»«<«"“  •»  -<».  "•»««  "  ecnivcsiiin  beiween  ilixiiing  p„i„,  n„mbe, 

.'Mem. i  need  be  iieiihe,  i.ne-lim.ne  seen  in  figure  5  Tbe  neeex.my  end  suite,,,  condilii.nx  f,„ 

rounding  and  Iruncalion  conversions  beiween  ineoinmensiirable 
signilicance  spaces  lo  be  (I)  one-to-one  mappings  and  (2)  onto 
mappings  have  been  determined2  in  the  Base  Conversion  Theorem. 


r! 


s23 


to 


s! 


to 


I lieorem  4  (Base  Conversion  Theoieni):  For  incommensurable  bases  0 
and  6.  the  truncation  (rounding)  conversion  mapping  of  Sp  to  s'!’ 
,.c..T^|S^S^(R-  IS^-Sg*) 


*N  one-to-one  if  and  only  i T 6 


1  >0"  I 


‘i  is  onto  if  and  only  if  0"  1  >6"’  | 


The  details  of  the  proof  of  this  theorem  are  given  in  |2|.  however, 
an  intuitive  understanding  for  the  result  can  be  gleamed  from 
comparing  the  gap  functions  (see  figure  2).  Certainly  if  l'£’  is 
uniformly  less  than  I'jj,  then  the  mapping  of  Sy  to  S™  should  be 
one-to-one.  Conversely,  if  the  maximum  of  C™  falls  above  the 
minimum  (restricted  to  Sg )  of  \J  then  from  a  theorem  of 
Kronccker  (that  the  iniegei  multiples  of  an  irrational  number  mod  I 
are  dense  in  the  unit  interval)  it  can  be  shown  that  for  some  bcSli. 
I'fi'(b)  >  l^(b).  P 

In  summary  the  essential  effect  of  formulae  (12)  regarding  the 
conversion  of  the  initial  system  Sjj  to  the  target  system  S£'  is  that  to’ 
assure  one-to-one  conversion  a  digit  must  be  sacrificed  in  the  target 
system  and  to  assure  onto  conversion  a  digit  must  be  sacrificed  in  the 
initial  system. 

The  conditions  lor  one-to-one  conversion  guarantee  another 

desirable  properly  ol  the  conversion  mapping,  since  it  is  readily 

shown  that  a  weakly  order  preserving  mapping  on  R  '  is  strongly  order 

preserving  on  R  if  and  only  if  the  mapping  is  one-to-one  from  R  '  to 
the  reals. 


1’igure  5:  Conversion  hy  rounding  or  the 
3  (significant)  hit  binary  numbers  lo  the 
I  (significant)  digit,  decimal  numbers 
over  the  range  I  1 . 1 000 1 ,  indicating  that 
U  |0I  Sj  is  neither  one-to-one  nor  onto. 


Corollary^  T-  and  RjJ’  are  each  strongly  urder 


preserving  on  if  and  only  if  6m~  1  >  0"  | 
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ll  is  instniciive  lo  consider  t  he  implications  of  I  lie 
Hum.  C'onveision  Theorem  lor  decimal  lo  binary 
conveision.  Il  follows  that  (lie  mapping  by  rounding  or 
tiimcalion  conversion  of  a  decimal  based  significance 
space  lo  a  binary  based  significance  space  will  be 
one-to-one  (and  strongly  older  pieserving)  if  and  only  if 
-  bits  >  3.32  ...  x  (-digits)  +1  (13) 

and  onto  if  and  only  il 

=  bits  <  3.32  ...  x  (^digits)  3.32  .  .  (14) 

Relating  these  inequalities  (13.  14)  to  the  membership 
density  fotmula  ( I ).  we  conclude  that  decimal  to  binary 
lloaling  point  conversion  will  be  (i)  one-to-one  and 
strongly  older  pieserving  if  and  only  if  the  decimal 
system  has  less  tliui  (log|(12)  =  .271  times  the 
membetship  of  the  binary  system  and  (n)  onto  if  a, id 
only  if  the  decimal  system  has  more  than  IS  (log|()2)  = 
5.4IS  tunes  the  member  hip  of  the  binary  system.  Thus 
conversions  between  floating  point  systems  of  nearly 

equal  density  will  be  neither  one-to-one  nor  onto. 

The  properties  of  decimal-binary  conversion  are 

sucemtly  presented  in  figure  (t.  The  lattice  point  n,  m 
corresponds  lo  the  binary  system  S'1  and  the  decimal 
system  S"’n.  Lattice  points  falling  to  the  left  of  the  line 
n=(log,IOi  mHl=3.32m+l  cot  respond  to  decimal  to 
binary  conversions  which  are  one-to-one  and  strongly 
order  preserving,  as  well  as  binary  to  decimal  conversions 
which  are  onto.  Lattice  points  falling  to  the  right  of  the 
line  u  =  3.32m  3.32  correspond  to  decimal  to  binary 
conversions  which  are  onto  and  Jo  binary  to  decimal 
conversions  which  are  one-to-one  and  strongly  order 
preserving.  Lattice  points  falling  between  n  -  3.32m  +  l 
and  n  =  3.32m  3.32  correspond  to  decimal-binary 
conversions  which  have  none  of  these  tlncc  poverties. 
The  equal  density  line  n  =  3.32m  SX  sepai. oes  the 
lattice  points  so  that  those  to  the  i  It  con  spond  to 
binary  systems  which  are  more  dense  than  the  decimal 
systems  and  lattice  points  to  the  right  correspond  to  the 
decimal  system  being  more  dense.  An  increase  ol  one 
unit  on  the  n  axis  increases  |S"/S™0I  by  a  factor  of  2, 
and  a  one  unit  increase  on  the  m  axis  decreases  this 
density  ratio  by  a  factor  of  10.  so  |S'2'/S™HI  may  be  easily 
estimated  by  determining  the  distance  of  the  lattice 
point  it.  m  from  the  equal  density  line. 


ill 

#  decimal  digits 


I  iliurc  6:  A  summary  «>f  the  properties  of 
filiation  point  decimal-binary  base  conversion. 
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IV  COMPOUND  CONVERSIONS 

rhc  P'fteding  section  dealt  only  with  the  properties  of  a  single  conversion  mapping.  We  have  indicated  previously 
that  computing  environments  such  as  I'L  l  language  programming,  multi-computer  networks  and  B.(  \[),  tape 
updating  on  a  binary  machine  present  situations  where  data  may  be  subjected  to  multiple  base  conversions  during 
overall  job  execution.  Considerable  care  must  be  provided  in  such  mixed  base  computing  environments  to  avoid 
excessive  accumulation  of  error  in  purportedly  “constant”  data.  To  critically  analyze  tins  problem  the  notion  of 
compound  conversion  is  formally  introduced. 

Definition:  For  all  n  >  I .  (s  >  2, 


i»  fjj  and  R^  are  l-fold  t\>mponnd conversions  through  So 


ii)  lorQa  k  -fold  compound  conversion  through  So1, Si2 . Sok 

’  *  . He ' 

T'jO  ljn^  RpQ  afe  ( k+ 1 )  fold  compound  ivnvcrsions  through 

S, 1  si’2  s"k  X" 

0,  ■  \ . \  ■  sp 


Furihermoie  Q  is  a  compound  truncation  (rounding)  conversion  if  all  the  individual 
are  truncation  (rounding)  conversions. 


conversions  composing  Q 


Unis  R£R£'  ,s  a  Mold  compound  rounding  conversion  through  Sg* .  S Ig.  and  Rg’T^g’Rg  is  a  4-fold  compound 
conversion  through  Sg.  Sg' .  S[r  Sg*  (see  figure  7). 

A  compound  conversion  is  a  composition  of  mappings,  and  many  properties  of  individual 
carry  over  to  compositions  of  such  mappings.  Thus  the  following  import 
arc  immediate. 


mappings  readily 
riant  properties  of  compound  conversions 


— Illma  v  ^  0  is  a  compound  conversion,  then  Q<  X)=  Q(x>  for  all  x. 

IfimiB  6:  impound  conversions  are  weakly  order  preserving. 

An  evident  property  of  truncation  conversion  is  that  the  magnitude  of  Tg(x)  is  never  larger  tin, 
magnitude  ol  x.  Thus  truncation  conversion  performs  a  contraction  of  the  reals  towards  zero  (see  figure  .fa). 

D£iiiiilinn:  The  function  M:Reals-*Reals  is  a  contraction  if  M(x)  has  the  same  sign  as  x  and  |M  <x)|<|x|  for  all  x. 


titan  the 
(see  figure  .fa). 


1-iKurc-  7:  The  com  position  of  a  compound  conversion  from  successive  conversions 

(a)  the  sequence  of  conversions  kS,  R?,  | ' ,  « D1  • 

P  O  (i  o 

(h)  the  4-fo Id  compound  conversion  Q  = 
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l  leai  ly  ;i  composition  ol  contractions  is  ;i  con  I  met  ion . 

— 11111:1  (  ompuiind  mine. lion  convcisions  are  conlraciions. 

<hC  mappiMps  :,,ld  '/l  arc  i,,e,""ics  «»'  their  image  space,  S!j,  however  T*  T2  (i  8)  =  |  75  Jnd  T«  T2 

' '  7M  *  '  f’; . .  """S'  . . .  «-»*.  0  mm  necessarily  arjim'!’ 

themselves  I,  the  mapping  0.  Ko,  ,l,e  compound  conversion  0  illustrated  i„  figure  7  n.ne  rha,  „  „  and  „  arc 

“•  *  °  «**»  -  * ~ - . -  -f .* ^  w„,„  « 

,  8™"J|  . . .  “"’M  . .  "V  >  «'»"»»»«l  conversion  may  he  difficult  determine  bu, 

-  -  none  ,1,  less  imp.,,,™,,  since  ,hc  weekly  order  preserving  property  compound  e,„„e,si„„s  assures  ns 

that  in,.1,  I  I  "  I'"  ""UUl  KhUC"  ,W“  SUCh  i:’Variai"  P"",,s  ol  compound  conversion  can  be  mapped  oulside 
inai  iniuval  by  ilial  compound  conversion. 

,  K"r'”al  -  .  X  ”'C"  ■  *  «  — "V  «*«.  .»*  -  fixed  points  of  the  mapping  M.  Howe.e, 

■  nnl  contusion  w„l,  the  eompmaiional  not,,,,,  of  -fixed  P„i,„  number,-.  we  shell  tefer  to  fixed  points  of 
mappings  as  invariant  points.  ” 


Dctnmion:  Let  M  be  a  mapping  of  the  reals  to 

/(Ml  =  {x|  M( x >  =  >;} 


'lie  reals.  Then  the  invariant  set  of  M.  denoted  /( M >,  is  given  by 


and  tor  xc/(M)  ue  say  that  x  is 


05) 


.m  invariant  point  of  the  mapping  M. 


0»  firs,  result  characterizing  the  invatiam  sc',,  of  contponnd  conversions  siaics  .ha.  any  element  common 

.1  significance  spaces  through  which  a  eoniponnd  conversion  passes  h  an  invariam  pom,  of  ,hu,  compound 
conversion. 

~~~~  0  «  a  k-fold  compound  conversion  through  sj'.  Sj2 . sjk,  then  H  S^C/fQ). 

^  2  k  i=l  i 


i=l 

Tff:  '  7  ""y  *  '  °  y- Cadl  '*  ",C  “'"-ersions  con, posing  Q  maps  ,  „„„  ilself  since  «*•  and  I*  are  identities 

S0  fort- 1 . k.  Thus  0(x)  =  x,  and  xe/(0)  P| 


on 


F„,,he„n„,e  i,  is  now  shown  tha,  if  0  is  a  contponnd  truncation  conversion,  then  „,e  poutts  in  the 
intersection  of  the  are  the  only  invariant  points  of  the  mapping. 


~ COrCI11  t):  lf  0  is  a  k-fuld  CHmP°u"‘*  'tuncalion  conversion  through  s"1, S,2  .  .  . .  S>,  then  AO)  -  n  s"' 

P,  P2  Pk  i  i  . 

_  V  !=  I  1 


0  * X  X: l  -'**  P,  Xc  m-  —  X  /  n  and  let  ns  show 

*  From  s,e„c„mPhme„,a„,y„f  Qilcmma  5).  X  >  0  may  be  assumed.  Lei  j  be  sonie'indc,!  such  that 

Compound  truncation  conversion,  are  contraction,  and  Itnncafion  conversion  is  weakly  order  presenting  so 


*  >  >yj  (T£l  ..  .  Tfl  <x»  >  0(X) 


thus  x  ¥=  Q(x),  proving  the  theorem. 


i.ciiini:i  8  and  I lioorom  ')  slum  the  mipoi tiince  ol  the  intersection  of  significance  spaces  with  regards  to 
determining  the  invariant  points  of  compound  conversion  mappings.  In  turn  the  character  of  the  intersection 
"I  significance  spaces  depends  in  large  part  on  the  coniincnsurnhility  or  incommensurability  of  the  significance 
spaces  involved.  Specifically  the  base  sixteen  is  tit  the  binary  family,  and  every  hexadecimal  number  may  be 
easil\  convened  to  binary  by  writing  each  hexadecimal  digit  as  the  appropriate  four  bits.  For  bcSh  the  24  bit 
binary  i epieseiit j i ion  o!  the  six  hexadecimal  digits  of  h  may  have  up  to  tin ee  leading  yet  os,  so  some  22  bit 


bunny  numbers  may  not  be  representable  in  S'^.  e.g.  (1  +  2  2  ;  however  every  21  bit  binary  number  will 

be  contained  m  S‘’(  .ln  generalt/mg  this  notion  it  is  evident  that  tf/3-ft1’.  theneaclt  base  0  digit  may  be  represented 
In  p  base  ft  digits,  lints  an  n-digit  base  0  integer  yields  an  np-digit  base  ft  representation  of  which  no  more  than 

P  1  ,e:ltil"i=  l,|yils  »WV  |K>  Setting  m  =  <n  1  Ip  +  1.  we  then  have  SJJ’C  S'j.  S£’ "  <£  Sjj.  and  the  following 
theorem  is  derived. 


•  heorem  10.:  Let  0r  0, - dk  be  commensurable  bases  of  the  ft  family.  Then  for  m  =  I  +  min{(n.  I  )|ogg0.}  , 

i.  1 

(16) 


ten  sy  and  s,™  "  £  n  # 

i-l  i  I-I  "i 

Actually  theorem  10  can  be  sharpened  if  we  also  consider  the  different  intervals  [8*.  0  J+ '].  Reasoning  as 
above  can  be  shown  tha,  by  letting  p.  =  loggfL.  and  setting  m  =  l+min  {pjn.  I  )  +(j  mod  p),  then  we  have 

w  r)i0l.0i‘,i  =  n  £ini0\0in\  <i?) 

i=l 

Furthermore  it  i|  is  the  miinmi/ing  value  of  i  in  the  above  definition  of  m,  then  also  Sg  =  Sgq  over  the  interval 

[P'.fl1*1]. 


These  observations  along  with  theorem  •)  yield  important  properties  of  certain  compound  conversions,  If  Q  is 

a  compound  truncation  conversion  through  commensurable  significance  spaces,  then  Q(x)  c/(Q)  for  all  real  x. 

Hence  QQ  =  Q.  and  it  0*  is  any  compound  truncation  conversion  through  the  same  commensurable  significance 

spaces  as  Q  but  in  any  older,  then  O'  =  0-  Furthermore  wiieu  0  =  To1  ...  Trtk  with  0,  .  .  .  ,  0,  in  the  same 

n  P|  Ik 

commensurable  family,  then  lx  0(x  )l,  x  <  max  I  p'( x ),  so  accumulated  conversion  error  is  effectively  controlled. 


Now  it  will  be  shown  that  the  intersection  of  incommensurable  significance  spaces  docs  not  contain  any 
coninn  n  significance  space,  for  such  an  intersection  has  only  a  finite  number  of  elements. 


Theorem  I  I :  If  0  and  ft  are  incommensurable,  then  fl  Sg1  has  no  more  than  2(0"  I  )(6n'  I  )+l  members. 

Proof:  Let  P  be  the  following  set  of  ordered  pairs  of  integers,  P  =  |  (k,  k*)  |  |k|<0n,  |k*|<6"'  and  kk*>lor  k=k*=o| 
Define  a  mapping  P:  SjjfTSg'  as  follows.  P<0)=(0,0).  Suppose  beS^nSg' .  b  #  0.  Then  b  =  k0*  where  k  and  j  may 
be  chosen  uniquely  such  that  |k|  <  0 "  and  0  does  not  divide  k.  Similarly,  there  are  unique  k  * ,  j  *  such  that  b  =  k*6J’ 
where  |k*|  <6nl  and  ft  does  not  divide  k*.  In  this  case  P(b)=(k,k*).  Thus  P  yields  the  nomali/cd  (right  shifted) 
integer  portions  of  the  representations  of  any  element  common  to  S p  and  .Sg’.  It  is  now  shown  that  P  is  a  one-to-one 
mapping  of  OS™  into  P , 

Clearly  only  zero  is  mapped  into  (0,0),  so  let  a,  bcS^D  Sg1  be  any  two  non-zero  elements  where  P(a)  =  P(b).  Then 
there  must  exist  k,k*  0,  and  j.  j*,  i,  i*  such  that  a  =  k/?  =  k*6j\  and  b  =  kfJ'  =  k*6'*  where  0  does  not  divide  k  and 
ft  docs  not  divide  k*.  Then  k/k*  =  b'*  j0'  =  b''/0\  so  that  ftj’  '  '*  =  00~  '.Then  j*  =  i*  and  j  =  i  since  0  and  6  are 
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ineommensurable,  so  a=b  and  I*  is  one-to-one.  Now  ^  O 
-10"  I  >(<5 I )+ 1  elements.  proving  the  theorem. 


can  have  no  more  memheis  than  P  which  has 


L!tih/nig  i he  fundamental  Iheoiem  of  arithmetic  (uni<|uc  prime  decomposition).  much  more  can  lie  said  about  the 
members  ot  S^flS™  lor  particular  (j  a;id  b.  It  0  and  b  are  relatively  piime.  then  all  memheis  of  S^fTS™  are  integers, 
with  the  hugest  such  integer  strictly. less  than  If  the  greatest  common  divisor  of  the  incommensurable  0  and 

is  a  prime  niimbei  (as  in  the  binary -decimal  case),  then  the  smallest  positive  element  ofS^'nsjJ’  can  he  shown  to  he 
a  negative  power  ol  that  prime.  This  smallest  positive  number  exactly  tepresentable  in  both  systems  can  he 
considerably  huger  than  the  underflow  bound  on  a  typical  computer,  for  example  2  10  =  .00()‘)7hSb25  is  the 
smallest  positive  member  of  S*4D SJ  . 

l  or  a  compound  tiuncation  conversion  Q  through  S^1,  i  = 

1 . "'<■*  littve  shown  that  Q(x)  =  x  if  and  only  if  x 'is  in 


the  intersection  ol  al!  significance  spaces,  hurthermore  if  at 
least  one  pan  ot  the  are  not  commcnsuiuhlc.  then  the 
intersection  is  finite  and  0(x)  ¥=  x  for  any  positive  x  lower 
than  the  minimum  positive  element  of  the  intersection.  For 
such  an  \  the  compound  truncation  conversion  Q  would  have 
Q(  x  I  >  QQ<  x  I  >  000(  x )  >  ...>0(i,(.x)>  .  ...  and  n,  lad 
<x  0  *(x|  -  0  since  /cio  is  the  only  finite  accumulation 
point  ol  Sv  Ihtis  lot  all  i.  Q1  ^  Q‘  1  lor  an\  compound 
truncation  conversion  Q  involving  at  least  two 
incommensurable  bases,  in  sharp  contrast  to  a  compound 
truncation  conversion  through  comnienstuahle  significance 
spaces  where  the  iterated  conversions  immediately  converged. 
Practically  speaking,  the  successive  updatings  of  a  B.C.D.  tape4 
on  a  binars  machine  could  cause  some  of  the  "constant" 
floating  point  data  to  be  iteratively  converted  back-and-lorth 
with  each  updating.  II  truncation  conversion  were  adherred  to 
as  a  standard,  some  ol  this  data  could  drill  lower  m  value  (see 
figure  8),  losing  all  accuracy,  with  no  ei ror  indication  provided 
by  the  system.  Thus  truncation  conversion  should  be  avoided 
in  mixed  base  computation  unless  all  bases  are  in  the  same 
commensurable  family. 

With  rounding  conversion  the  error  accumulation  upon 
successive  conversions  of  a  datum  amongst  incommensurable 


as  well  as  commensurable  bases  is  much  better  controlled.  For 
example,  it  has  been  shown8  '1  that  with  suitably  high 
significance  in  the  intermediate  space  Sg,  the  2-fold 
compound  conversions  RjjRg1  and  RjjT^  can  both  reduce  to 
the  identity  on  Sp, 


figure  8:  The  possible  drift  in  value  of  a 
“constant  datum"  under  iterated  truncation 
conversion  between  incommensurable 
significance  spaces. 
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i'hcpiem  12  ( liKind-Oui  (.'onveisiPii  TIicpiciii)  I  pi  0  and  6  incommensurable, 

1 1  is  the  identity  on  Sjj  «•  6"'  1  >/i" 

n)  RJTJ-’  is  the  identity  on  Sjj  «  bm  '  >  20"  | 

Thus  lipiti  theorem  12  we  see  tli.it  tin  ceitam  compound  conversions  the  invariant  set  will  he  the  whole  image 
space. 

I'orollaiy  12.1:  I  pi  0  and  b  ineominensiirahle. 

'I  /(R-jR^»  =  S(,jl  »  b"' 

"»  /(  Rj§T™  I  =  Sjj  ~  6 <" 

The  In-and-out  ('onveision  Theoiem’  has  been  stated  lor  the  ease  of  incommensurable  bases,  however, 
incorporating  the  teehim|ues  of  theorem  10.  a  more  geneial  result  encompassing  both  commensurable  and 
incommensurable  bases  can  be  conveniently  derived  for  the  2-fold  compound  rounding  conversion  case. 

Corollary  12.2:  For  0,  b  >  2.  let  7  be  the  greatest  common  root  of  0  and  b  when  0  and  b  are  commensurable, 

and  let  7=1  otherwise.  Then 


1  >0" 

1  >  2d"  I 


(IK) 


Rj!j R™  is  the  identity  on  Sjj  *>  7 b1"  1  >0" 

Proof:  When  0  and  b  are  incommensurable  7=1.  and  bm  1  *  0" ,  so  <I‘M  follows  from  theorem  12.  Otherwise, 
when  0  and  b  are  commensurable  with  greatest  common  root  7,  then  b  =  2/',0  =  7  ’  where  i  and  j  are  relatively 
prime,  Hence  .SjjC.S'-j'  and  "'"CSj1.  Now  7  6'"  1  >  0"  implies  that  <m  Di+I  >  jn.  so  .Sjj  C  Sjj'  and 
R *j Rg’  is  clearly  the  identity  on  Sjj.  Alternatively  assuming  76m  1  <  0n  means  that  (111  l)i+l  <  jn.  Now  since  i 
and  j  are  relatively  prime,  there  exists  a  k  such  that  k  =  j  I  mod  j.  k  =  0  mod  i.  Hut  then  over  the 

interval  [7  k,  7k‘ ’]  and  also  Sjj'  =  S^,n'  1  ,l‘  1  over  the  same  interval,  so  that  (m  I  )i+l  <  in  means  Sjj  is 

contained  in  but  not  erpial  to  Sjj  over  the  interval  £  7  k,  7k  Thus  RjjRjJ1  can  not  be  t be  identity  on  S^. 
completing  the  corollary. 

When  the  condition  76"’  1  >  d"  is  not  obtained,  then  clearly  Sjjnsjj'C  /  (KjjRjj'  *5=  Sd’  a"d  "  'S  inu’rcsI 

to  give  some  alternative  properties  slit  I  idem  to  characterize  an  invariant  point  of  the  compound  conversion 

In  general  if  the  image  space  of  the  compound  conversion  0  were  exactly  equal  to  /(Oh  then  QQ=Q,and  a  desirable 
situation  controlling  accumulated  error  is  obtained  If  the  k-l'old  compound  conversion  0  ends  with  a  truncation 
conversion,  we  have  shown  that  00  may  not  equal  0.  I  veil  if  the  final  conversion  of  such  a  0  is  a  rounding 
conversion,  the  image  space  ol  0  can  contain  some  points  which  arc  not  invariant  points  of  0,  For  example,  let  0 
R^Rj  Since  24  1  =  K  =  32  I ,  by  theorem  4  the  mapping  R2  |S^  S2  is  onto,  so  0  covers  all  ol  S‘.  However 
from  corollary  1 2. 1 .  /(R2  R4 )  ^  S2 . 

Some  detailed  criteria  lor  determining  invariant  points  of  RjjR™  will  now  be  considered. 


i:0Mim:i  —  AsNUme  lllal  beSjj.  b  *  ±0'  t'"f  any  i.  and  I'm  some  deS£\  Rjj(d)  =  h.  Then  RjjR£'(h)  =  h.  i.e.  be/(R'!R™  ) 

~  LCI  P  :"ul  A  he  commensurable  of  the  7  family  Let  hcS£\  b*±dj  for  any  i  and  assume  7  i  <  |h|  <  7  j*  1 
Over  the  intervals J,  7  '*  '"I  u(,j  [71  7  j*  H  ,,,,1,.,,.  i-n/~cn>  .... 

^  -I  L  •  J  1 1  1  S^CSA  or  C  S^,  and  in  either  ease  with  deSL" . 

Kd  (d)  =  h  implies  that  hcSjJT lS£,  Therefore  h  e  /(RjjR™ ). 

Altei natively  let  ns  assume  that  (3  and  b  are  ineonimensurahle.  Let  ()<b  cSp  have  predecessor  h  and  successor  b  ,  and 
assume  that  h  for  any  i.  Then  h  h  '=  h  '  h.  We  assume  there  exists  a  deS™  such  that  R|](d)  =  b  so 

Id  h  1  <  (h  '  b)/2 


CO) 


f  rom  the  definition  of  rounding  ( 1 0).  it  is  evident  that  |x  R?'(x)  |  =  min  {|x  a|}.  Therefore  with  x  =  h 

«S, " 

jin/ 


and  again  with  x  =  R^’(b  ) 


and  finally  combining  (20  22) 


>’■  Rgfb)  |  <  |h  '  d| 


|K6,(h  >  R2RS>(h  )  1  <  )  b 


lb 


R^R^(h')  I  <  h  b 


(21) 

(22) 

(2d) 

-  h '  h.  and 


“  C‘""lily-  "ltn  1  “J  <»  -MI««*M  he  o.|ualhics.  Thorolorc  Id  R?lk- 

' . . . "  "K,e  C,"'IJ  hc  ""  momhm.d  SJ'  l.o.wocn  J  and  KJ'll.  I.  AIM,  , l,e„ 

b  =  ( d  +  R'”(  h  ) )  2 

(24) 

so  that  R™(h  ) would  be  the  successor  of  d  in  Si"  \, .1 ,  .  . 

i  '  b  ■  ‘  *  'v  'my  v II  men t  and  its  successor  in  S31  must  diffei  bv  1 

amount  A  J  lor  some  j  Siiml  irlv  h  h  -  tV  r ,  6  •  J 

t  ,  •  h  "  V  ,,M  M,,"c  '•  *>  dial  with  equality  in  (23)  b  >  =  B\  But  then  i  =  i  =  r 

'"'cl  and  (3  have  been  assumed  incommensurable.  Now  b'  b  =  (3°  =■•  |  on|v  jf  hull' 

integers,  and  suinlarlv  d  ■„,d  d  =  R"'(h  T  u  ,,,  ^  W"ly  "  ”  ;,"d  h  a,v  conseeutiv, 

•  ,h  *  "Ullld  lK‘  consecutive  integers,  contradicting  (24).  Therefore  (23)  mus 

,C  ‘  SMtl  ,Me'111  ly'  M'  ,hal  W,b)  equal  b.  The  ease  for  negative  hcS2  follows  from  sig, 

complementarity,  completing  the  lemma.  d  r 

A:a  "r",is  fcmm“ "  “ "™  *"»■"  ■'  ««*».* . .  .1,0  far  iu„o,,„„,  rs  a,«j  rr  win  M.m.-o 

to  determine  many  of  the  invariant  points  of  R^R^'. 

— >llary  — 1  Ass,,mc  hcSl  b*  -  &  a">'  *.  l^(h»rg’(h).  Then  be/(R’-R£'). 

— :  II  O<h  cS0  wi,h  b^'  f‘>r  any  i.  then  h  b=b  ”  h and  the  interval  mapping 
open-hall  closed  interval  |(h  '  b,/2.(b  b',/2).  Suppose  no  member  of  Sg*  falls  in  this  interval,  then 

max  (did  <  b  '  ,deS£’J  Mb  ’  b)/2 

min  jdld  >  b'  .deSg'}>  (b"  bV2 
so  that  evaluating  l^’at  the  real  number  b’(see  definition  (6)) 


mapping  into  b  under  R^  is  the  half 


I  g'fb )  >  (b "  b’)/h  =  l’^(b') 
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Tlnis  it  l'|j(b  )  >  I'g  (b  ).  then  some  member  of  Sg1  must  map  into  b  ’ under  Rjj,  and  by  lemma  13  b  'e/(R^Rg  ). 
The  result  tor  negative  beS^,  b^=  /3‘ ,  then  follows  front  sign  complementarity. 


Thus  it  is  readily  possible  to  find  sequences  of  invariant  points  o<  RjjRg  from  a  gap  function  comparison. 
Corollary  13.2:  For  0'  <  6  K  assume  that  Fj^Pg  over  the  open  interval  (0'.  min|0,+ 1 , 6  J  j).  Then 

(0i.min|0i+,1«ij)ns^C/(R^  Rg) 

Referring  back  to  figure  2.  it  is  evident  from  corollary  13.2  tha*  all  members  of  S*ft  greater  than  .0625  and 
less  than  0.1  are  invariant  points  of  R?6R|0-  Generally  not  all  members  of  Sjj  from  an  interval.  I,  where  PJj<Pg  , 
will  be  members  of  /(R^Rg  ).  but  certainly  any  point  of  RjjfSg  )  other  than  a  power  of  0  must  be  an  invariant 
point  of  R^Rg1.  Since  Rjj  restricted  to  Sg  01  is  one-to-one  to  S^,  we  may  surmise  that  the  number  of  members  of 
/(RjjRg1 1  in  a  neighborhood  of  x  is  comparable  to  the  lesser  of  the  numbers  of  members  of  Sjj  and  of  Sg  in  that 
neighborhood.  Hence  although  the  members  of  /(R^Rg’ )  are  more  erratically  spaced,  the  relative  difference  between 
neighboring  points  of  f(R!jRg  )  will  still  be  bounded  with  a  bound  larger  but  not  by  an  order  of  magnitude  than  the 
worst  case  in  Sjj  and  Sg  . 

The  preceding  lemma  and  its  corollaries  do  not  quite  provide  the  full  story  about  /(R^Rg1 ),  since  the  integral 
powers  of  0  and  6  represent  break  points  in  their  respective  gap  functions  and  by  the  preceding  theory  they 
must  be  treated  separately.  We  have  already  pointed  out  that  the  image  space  of  0  need  not  be  identical  to/(Q) 

even  for  0  =  R^Rg  .  so  the  question  of  whether  the  iterates  of  Q.  namely  0,  00,  000 . Q(k) . will 

converge  is  still  unanswered  for  0  =  R^Rg1.  This  question  is  a  very  practical  one  since  we  would  like  to 
know  if  iterated  rounding  conversion  between  binary  and  decimal  based  systems  will  allow  indefinite  drift  in  the 
value  of  a  “constant”  as  did  truncation  conversion,  or  if  a  stable  pair  of  values  must  be  achieved  after  a  fixed 
number  of  rounding  conversions  back  and  forth.  The  following  theorem  is  a  surprisingly  general  and  reassuring 
answer  to  this  question. 

Theorem  14:  (Iterated  Conversion  Theorem):  Let  Q  =  R^R^  where  m,  n  >  2.  Then  QQQ=QQ.  Furthermore  this 
result  is  best  possible  in  the  sense  that  there  exists  0,5.n,m  >  2  such  that  R^QQ  #  Rj^Q. 

Proof:  For  any  real  x,  Rg’R^Rg  R^(x)  =  Rg  R^(x)  unless  Rg  R^(x)  =±6'  for  some  i  by  lemma  13.  Similarly 
with  Q  =  Rg  R^.  OQO(x)  =  QQ(x)  unless  QQ(x)  =  ±5j  with  j  ¥=  i.  Assuming  QQQ(x)  #  QQ(x),  the  definition  of 
rounding  conversion  assures  that 


IRg1  Rjjfx)  -  R|j(x)|  S*  IR^Rg1  Rjj(x)  -  Rg1  RjJ(x)| 


^  IRg  R^Rfi  R^(x)  -  RjjRg  Rjj(x)| 
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so  that  with  |Q(x)|  =  S'.  |QO(x)|  =  5j  *  S', 

IOO(x)  CKx)|  <  2|RjpR2<x)  RJj(x)|  <  S‘(  l+5,_m)  5j 
and  also  p 

IOO(x)  Q(x)|  =  |5‘  6j|  =  6'||  5i-'| 

and  since  in  >  2  by  assumption  in  the  theorem 


II  5  J~ '|  <5  1 " m  <  I/S  <  | /2 


"  1  '>as  a  positive  integer  value  lor  j>  i.  and  for  j  <  i.  the  smallest  value  of  ||  S^lisl  1/6 

>  I  2.  and  this  is  a  contradiction.  Hence  QQCXx)  =  QQ(x)  for  all  x. 

The  remainder  of  the  theorem  demonstrating  that  R^QQ^R "Q  for  some  n.m./J,6  >  2  is  shown  in  (he  example 
of  limin'  0  r 


12  00000 


1 1  00000 


1 0  00000 
l>  ‘>0000 
l>  80000 
70000 

o  (>0000 


I'igure  9 ; 


l.eran-d  conversions  .,1  x  1. 120,000  by  the  con,,, . d  conversion 

l0.  showing  tlm,  K  ?„«  |0K 


C  onipmei  hardware  involving  the  bases  2,  8,  and  16  of  the  binary  family  and  the  base  10  is  in  popular  usage.  Our 
results  so  tar  discuss  accumulated  error  under  compound  conversion  (I )  within  a  commensurable  family  and  (2) 
between  two  incommensurable  significance  spaces.  The  modification  of  our  results  to  cover  compound  conversion 
betw  een  more  than  two  significance  spaces  all  from  just  two  different  commensurable  families  is  straightforward,  so 
that  out  theory  does  cover  the  current  situations  where  one  might  expect  to  encounter  mixed  base  computation. 

If  a  suitable  3-state  device  were  perfected  for  use  in  computer  hardware,  a  ternary  based  floating  point 

sNstem  would  undoubtedly  be  implemented  and  the  possibility  of  mixed  base  computation  amongst  the  bases  2,  3 
and  10  would  ensue.  Our  final  thoughts  will  then  be  concerned  with  compound  conversion  through  a  collection  of 
incommensurable  significance  spaces 

For  the  significance  spaces  S* , t  S^4,  and  S7.  we  have  1,960,563  =  (II  I AOOO ) j  f  eS*,,  1,960,576  =  (I  1 101 1 
1 1010  1 01 00  00000 >2 fS,  ,  and  1 ,960,500  =  (10002  1 4000)s  eSj, and  the  absolute  difference  between  an  element 
and  its  successor  in  this  neighborhood  is  121,  128,  and  125  respectively  (see  figure  10).  Thus  the  iterates  of  the 
compound  rounding  conversion  0  =  R^R^R*,  in  this  neighborhood  will  be  different  for  a  number  of  cycles.  As 
seen  in  figure  10,  Q(7,(  1 ,960,563)  =  1 ,961 ,375,  and  0(k)(l  ,960,563)  =  1 ,961 ,500  for  k  >  8,  hence  Q(k)  *  0(k~  •> 
for  at  least  all  k  <  8.  Examples  such  as  this  one  show  that  utilizing  rounding  conversion  is  not  enough  to  successfully 
restrict  the  drift  in  value  of  a  constant  datum  under  compound  conversion.  From  the  generalized  result  on 
’"•and  Out  Conversion  given  in  corollary  1 2.2,  a  resolution  to  the  problem  of  controlling  overall  error  growth  in  the 

presence  of  more  than  two  incommensurable  bases  by  intermediate  reconversions  to  a  standard  significance  space  is 
feasible. 


II  - 

Specifically  let  S^1.  I  <  i  <  k,  be  a  collection  of  significance  spaces  representing  the  different  floating  point  data 
formats  of  a  mixed  base  computational  environment.  Suppose  we  introduce  an  intermediate  space,  Sg\  with  the 
significance  n.  small  enough  such  that  Rg'R^  is  the  identity  on  S£'  for  all  ,.  Then  let  all  data  introduced  into 
the  mixed  base^  computational  environment  first  be  converted  by  rounding  to  Sg\  and  let  subsequent  conversions 
from  S0‘  to  S',  be  preceded  by  a  reconversion  to  Sf,  (i.e.  RJRfls"'  -  s\  Note  that  the  conversion  tim  S.”1 

to  Sg  always  regenerates  the  same  value,  Rg'(x),  in  Sg1  for  an  initial  datum  x,  since  Rg  R^'  is  the  identity  on 

Sg1.  Thus  the  value  of  the  initial  datum  x  whenever  encountered  in  Srt',  even  after  numerous  intermediate 

n.  m  "j  n 

r  gfcvi  1  y  I'.pf.g  ME  and  rFfj:  .saTuxjnlhzdUUii  trmsram’s  vaiuc  WiTIi  rEgaftfkto  tael.  provides 

a  highly  desirable  property  for  mixed  base  computation. 

i  Now  the  range  of  possible  values  achievable  in  S^1  is  R^'fSg1),  and  it  is  of  course  desirable  to  have  as  large  an 
m  as  possible  so  that  the  conversion  through  Sg1  does  not  introduce  too  much  error.  Yet  it  should  be  kept  in 
mind  that  if  m  is  chosen  so  large  that  one  of  the  Rg’R^'  is  not  the  identity  on  Sg1,  then  conversion  error  may 
accumulate  and  generate  a  greater  overall  error  than  if  a  smaller  m  (meaning  less  initial  accuracy)  were  chosen. 
These  observations  can  be  formalized  as  an  additional  corollary  to  theorem  12  and  corollary  12.2. 


CoroharyMlJ:  For  S^1,  I  <  i  <  k,  let  7.  be  the  greatest  common  root  of  5  and  0.  when  they  are  commensurable, 
and  let  7.  be  unity  otherwise.  Let 


m  <  min  j  (n,- 1 )  Iogg0.  +  logg T .  |  (25) 

and  let  Q.  =  R^'Rg1  for  I  <  i  <  k.  If  Q  =  0  j0  ’  for  any  1  <  j  <  k  where  Q '  is  composed  from  the  mappings  Q. 

I  <  i  <  k,  then  Q  =  Q  . 

1 


1“  ligure  10.  Iterates  of  the  compound  rounding  conversion  y  **II**2<,**S 

showing  the  drift  in  value  of  a  “constant  datum”  under  successive  conversions. 

I 
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II'1*  l;ict  that  111  |IUISI  be  bounded  from  above  in  (25)  in  order  to  guarantee  control  of  accumulated  error  and 
avoid  situations  such  as  that  exhibited  in  figure  10  demonstrates  that  the  phrase  "carry  more  digits”  does  not 
always  mean  that  greater  overall  accuracy  will  follow,  and  such  cliches  should  not  he  used  as  a  substitute  for  a 
•me  understanding  ol  the  formal  st  i  net  lire  of  flouting  point  number  systems  and  base  conversion. 
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